A REVERSION OF THE CHERNOFF BOUND 
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Abstract. This paper describes the construction of a lower bound for the 
tails of general random variables, using solely knowledge of their moment gen- 
erating function. The tilting procedure used allows for the construction of 
lower bounds that are tighter and more broadly applicable than existing tail 
approximations . 



1. Introduction 

This paper presents and solves a nonlinear optimization problem arising in the 
construction of lower bounds for the tails of distributions which possess a moment 
generating function on an open subset of M"*" . The resulting lower bounds comple- 
ment the classical Chernoff (upper) bound |5j in a set of cases more general than 
has been previously achieved. 

The methodology for the construction of the bounds was motivated by the pre- 
sentation of the lower bound in Cramer's large deviations theorem on p. 29 of 'lU'. 
An earlier version of the results presented here was used in to establish a lower 
bound to the asymptotic convergence rate for an algorithm for global optimization. 

The bounds presented here share numerous methodological characteristics with 
the development of saddleppoint approximations ^2j 9,- Both schemes use tilting, a 
technique first developed by Esscher, in order to center the power series expansions 
at the desired tail of the distribution. Our method concentrates on the restriction 
of the Laplace transform on the real line. A nonlinear optimization problem is 
constructed by adding two degrees of freedom to the tilting procedure. This allows 
us to obtain tighter lower bounds, which hold even in cases where existing lower 
bounds break down. 

The same direction was explored independently in pP where a rough lower bound 
is computed using some rudiments of the methodology utilized here. We parameter- 
ize the problem more efficiently, thereby arriving at a lower bound which possesses 
significantly better tightness. The tools used in this paper are also very similar to 
those employed by Vinogradov f(12j). in that they both explore beyond the realm 
of applicability of the Cramer condition. The main difference between our work 
and Vinogradov's lies in the different questions we ask. Vinogradov assumes tail 
properties and extends classical large deviations results for sums of random vari- 
ables under conditions not covered by classical techniques. On the other hand, we 
are interested in inferring tail estimates under minimal conditions on the Laplace 
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transform. In that sense, despite the shiiilarity m techniques with jl2| . our logic is 
more akin to that employed in ^ and [H]- 

Throughout the paper it is assumed that we have access to estimates of S(-), 
the Legendre dual of the cumulant transform (logarithm of the Laplace trans- 
form). In the next section we introduce the new lower bound, represented as a 
two-dimensional constrained nonlinear optimization problem. The third section of- 
fers comparisons with three alternative lower bounds. Following that, we proceed 
to solve to solve the nonlinear optimization problem, thus arriving at efficient nu- 
merical estimates. We include a figure which illustrates the comparison of the new 
lower bound with existing alternatives. 

2. A Lower Bound to Complement the Chernoff Bound 

Let X be a real- valued, positive random variable on a probability space {X,^). 
Assume that X has exponential moments with respect to /x, i.e. g{^) = \_&^^\ < 
oo for ^ in some open set (—00,^*), where ^* > 0. Let the rate function be given 
by the Legendre transform of the cumulant (i.e. the logarithm of the moment 
generating function),/^ (y) = sup^j^y — \ogg{(,)}. Further, let 

E{y) - / argsup^>o{ey - log5(0} if 2/ > E^[^] 



argsup^<o{e?; - log.g(0} if 2/ < E^[A] 

be the corresponding 'Legendre dual'. It is well known 10 that S is an increasing 
concave function with S(E''[A]) = 0, and thus, it is generally invertible. Using 
integration by parts we notice that 

^g{E{y))^yg{Eiy))E'iy)=^ 
ay 

(2.1) =^ g(S(y)) = exp |yS(y) 
The above formula leads to the following concise representation of the rate function: 

(2.2) I^{y) = r E{t)dt. 




Of course for any ^ < ^* , 

r E{t)dt^ f 



XdX 



and 



so that 



E'{E-\X)) = 



H'(S-i(A))' 
9' f d 



gg" _ g'2 \dXg 



3(S(x)) 



(2.3) r Eit)dt^xEix)-^E'\0-^og . 

However, the integral representation of the exponent in (|2.1f) has two advantages. 
Firstly it does not depend explicitly on the moment generating function, creating 
the possibility of constructing the tail bounds we are after using only the Legendre 
dual, E, side-stepping the moment generating function. We will explore this idea 
in a subsequent paper. Secondly, the integral representation of the exponent in 
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(|2.1(l allows us to combine rate functions in a straightforward manner by adding 
the corresponding Legendre duals. 

An application of the Markov inequality to the random variable exp{^X} suffices 
to obtain a surprisingly accurate upper bound to the tails of X, the celebrated 
Chernoff bound (assuming y > E^[X]): 



^l{X >y)^ inf M > < M ^ < exp{-I^iy)}. 



5>o' ' ~ ' ~ i>0 
Note that this bound works equally well when y < E^^lX], by considering the 
left hand tail instead of the right hand one. Specifically we obtain 

< y) = > -y) = ii^f ^ (e^"" ^ , 

which leads to the same upper bound for the left hand tail as the one for the right 
hand tail we obtained above in the case y > E'' [X] . 

This short exposition of the Chernoff bound emphasizes the optimizing degrees 
of freedom afforded to us by the free parameter ^. The estimates in the rest of the 
paper involve the construction of a lower bound to accompany the Chernoff bound. 

The tools we use in the construction of the desired lower bound are exponential 
tilting and the incorporation of a second optimizing degree of freedom. The former 
leads to a centering of the measure around the tail of interest [U]. The latter 
allows us to tailor the tilting procedure in order to optimize the iterative use of the 
Chernoff bound to the tilted measure. 

In particular, let i^^ G A4i{X) be a new probability measure on X defined by 
i/^{dt) = g{^)^^e^*ii{dt). This is called an exponentially tilted measure, after the 
theory of Esscher Tilting ,9j. To simplify the notation, we will use la to signify 
I,js(cy) ^i^d E" for E^-i^y). Observe that = /E^'[x]/a• Finally, let 

L(a, S, y) = (l - e-^°(^^' - e-^°(^)) exp {~I,,{ay) - E{ay)y{6 - a)} . 

With this terminology we are in a position to state the proposed reversion of the 
Chernoff bound as a nonlinear constrained optimization in two dimensions: 

Theorem 2.1. For any y > E''[X], the following inequality holds: 

(2.4) l^{X>y)> sup Lia,6,y). 

l<a<5 

Moreover, there exist feasible values of a and S which make the right hand side of 
jl^.^l ) strictly positive. 

Proof. Following the traditional proof of Cramer's theorem JU] we let y to be a 
I'Q-distributed random variable. Observe that, for any a, 



/oo I 
te'^^"y^f,{dt) 
-oo 



d^ 



log 5 = ay. 

«=S(ay) 



Then, for every 1 < a < S we have 

KX>y) = 9{^{ay)) I e-*=("*)i.„(di) 



> g{E{ay)) / e-*^("*V„(dt) 



> g{E{ay))e-'y-^"y^i^ai[y,Sy] 
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We are now in a position to apply the Chernoff bound iteratively as it were to 
estimate the last term on the right hand side. Specifically, we observe that, for 
any i5 > a > 1, > 5y) < exp {—Ia{Sy)}, because 6y > E"" [F] = ay and 

Vaiy < y) < exp {— /^(y)}, because y < E"" [F] = ay. Consequently we can 
estimate the last term on the right hand side of H2.5|l as 

(2.5) J^a{[y,Sy]) = 1 - i^aiY > Sy) - v^{Y <y)>l- e-'^^'y^ - e-^°(^) 

Substituting (|2.5() into (|2.5|l we see that, for any 1 < a < 6, fi{X > y) > L{a,S). 
Noting that the left hand side does not depend on a or S, we conclude that the 
inequality is maintained if we maximize the right hand side with respect to a and 
S, thus obtaining (|2.4|) . 

In order to evaluate the rate function for the tilted measure we observe that, for 
any 9, 

eri _ 9i& + ^iay)) 



E" [. 



5(^(ay)) 

Thus, Ea{Sy) = argsup {SyO — logE" [e^^] } must satisfy 



d f gie + Eiay)) \ d 



C=e+H(Qy) 



and therefore 

Ea{Sy) = E{Sy) - E{ay). 
Thus, using (|2.2|l in this situation we obtain 

(2.6) Ia{5y)= E^{t)dt = y{a - 6)E{ay) + E{t)dt 

JE°[y] Jay 

and 

(2.7) laiy) = r E^[t)dt = y{a ~ l)S(ay) - / " E{t)dt 

J^"[Y] Jy 

We are now in a position to show that there exists a feasible choice of a and 5 such 
that e~^°(''^' +e~^°'^) < 1, thus ensuring that the right hand side of (|2.4|l is strictly 
positive. Specifically, observe that the monotonicity of S(-) leads to 

Y^Ia{5y) = y{E{5y)-E{ay))>Q 

^USy) = y^E'iSy)>0 

which implies that 

(2.8) lim Ia{Sy) = oo. 

(5— >oo 



On the other hand, 



^Iaiy)^y\a-l)E'{ay)>0 
da 



which, together with the observation that Ia=i{y) = 0, implies that there exists an 
e > and 1 < a* < oo such that /a* {y) = e. Choose a 5* < oo such that Ia{Sy) > 
— log (1 — e^*). We can always do that because of (|2.8|l . Then, manifestly, la* {y) + 
la* {Sy) < 1 and therefore the right hand side of H2.4|l is strictUy positive. □ 
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Note that the lower bound in (|2.4|l does not depend on exphcit knowledge of 
the moment generating function g. Indeed, l|2.2(l . H2.6|l and H2.7|l show that all the 
components of l|2.4|l can be computed directly from S(-). This opens the possibility, 
discussed further in the Conclusions, that no precise knowledge, or perhaps even 
existence, of the moment generating function may be required for H2.4|l . 

Furthermore, observe that the lower bound in (|2.4|l can be described without the 
use of any integrals. Specifically, using H2.3|l with (|2.2|l . H2.6|l and H2.7|l we obtain 

I^iio^y) = ay2(Q!y) - log g(S (ay)) 

UV) = y{Eiy)-E{ay))+log&y)l 

Thus we see that all the components oi L{a, S, y), and thus of H2.4I) . can be expressed 
without the need for any intcrgrals. The integral representations shown above serve 
to do away with the explicit dependence on the moment generating function g. 



3. Comparison with Existing Lower Bounds 

At this point it is worthwhile to compare the lower bound in Theorem 12.11 to 
three other approximations. The first one is Daniels' saddlepoint approximation 
which, using our notation, is given by [SI [7] 

/•oo 

(3.1) ^l{X >y)^ (2^)-i/2 / ^/E^e-'-^'^dt. 

-'y 

Compared to (|2.4II in Theorem 12. II H3.1|l has the disadvantage that it involves an 
extra integral. In that sense, the Chernoff bound and H2.4|l can be thought as upper 
and lower bounds to the integral expression in H3.1|l . 

Second, we look at the lower bound proposed by Bagdasarov and Ostrovskii 
pp. While they use different notation, their methodology is very close to the one 
presented here. Specifically, their lower bound to /i(X > y) has only one free 
parameter, A > 0, which, using the notation in the current paper, is equivalent 
to 1 — S(y)/S(a?/). As in the proof of Theorem 12.11 above, their lower bound 
needs access to a point y+ > y. This point corresponds to the point Sy in our 
notation. They describe this point as the point where the function A(l + A)yl(y), 
where A takes the place of E{ay) in the notation used here. But the supremum 
of the function A(l + A)yl{y) over y is the Legendre transform of I{y), which 
is itself the Legendre transform of log(7(^). Using Legendre duality we conclude 
that y+ — 5y — (2S(Q!y) — S(y)). With these notational translations, the 
Bagdasarov-Ostrovskii (B-0) lower bound can be described as 

(3.2) 
where 

1 

L{a,y) = - 

and 



ti{X >y)> supL{a,y), 



a>l 



_ ~{ay) \ -I^iS{a,y)y) , -I, 
S(ay)-E(y) L 



iv)] 



1 _ Q-lciS{a,y)y) _ Q-Ic(y) 



-L{a,d{a,y),y) , 



S(a,y) ^ y-'E-U2E(ay) -E(y)) . 
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So we can see immediately that the B-0 lower bound is inferior to the one de- 
scribed in Theorem 12. II for two reasons. On the one hand it foregoes one of the two 
optimizing degrees of freedom (making S a function of a, which we will recognize 
as a suboptimal choice in the following section) . Furthermore, the term ^ ^^^"j^l 

makes the fraction on the right hand side of the expression for L strictly less than 
1. 

Also, ni does not provide a general statement about the range of applicability 
of the B-0 lower bound as Theorem 12. II does. It turns out that there are cases of 
interest where the B-0 lower bound is inapplicable. While the B-0 lower bound 
is less tight than 1)2.4(1 and its range of applicability is not as broad, T has the 
advantage of presenting their lower bound assuming only approximate knowledge 
of the moment generating function g{^). By contrast, the current paper assumes 
that we have complete knowledge of the moment generating function. It turns 
out that this is not necessary. Motivated by Bagdasarov and Ostrovskiis work we 
extend the results presented here to the more general case of only approximate 
knowledge of the moment generating function in a follow-up paper. 

In the same spirit is the lower bound presented in The construction of 

Stroock's lower bound is very similar to the one we present here, and in fact our 
presentation mirrors his. The main difference lies with Stroock's use of the Cheby- 
shev inequality to bound i^q([?/, Sy]), as opposed to our iterative use of the Chernoff 
bound. The symmetry of the Chebyshev inequality around the mean determines 
one of the two optimizing degrees of freedom, and consequently Stroock's lower 
bound can be described as: 

(3.3) n{X >y)> sup L{a,y), 

a>l 

where 

1 1 

i(a, V) = may)yHa-iy ^ - 1, y) . 

The first disadvantage of (|3.3|) when compared to 12.411 is the fact that it lacks one 
degree of freedom, whose optimization could only improve the latter. Secondly, 
unlike 1(2.41 which is guarranteed to work in general by Theorem 12.11 the range 
of apphcability of 1(3.3(1 is limited by the requirement that E'{ay)y'^{a — 1)^ > 1. 
There are indeed application of interest (which will be discussed in a subsequent 
section) that do not conform with this requirement, and for which therefore ((3.3() 
is inapplicable. In particular, one such application involves E{t) — c~ t^^ for some 
constant c > 0. One readily concludes that this choice makes E' {ay)y'^ {a — 1)^ = 
(l - i) < 1, thus invahdating 1(3.3(1 . 

Finally, even in its range of applicability, ((3.3(1 is less tight than 1(2.4(1 . In order 
to see this, let's consider the first order approximation to S(-) around ay we have 

(3.4) E{t)^E{ay)+E'iay){t-ay), 
and thus, using ((2.61 12.7(1 we obtain 

Tt\ ^fo^^y^~y^ 2/ iJ S'(ay)y2(a - 1)2 

(3.5) Ia{y) = [ay) ^ ay (a - 1) ^ 
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and 



(3g) ^ E'{ay)y^ia-iy 

The following lemma shows that, under the linear approximation to S(-), when 
(|3.3I) is valid, the ratio on the right hand side of the expression for L is less than 1 
for y large enough. 

Lemma 3.1. Fix a > 1 and y such that Ia{y) > \- Assume that S(-) is a linear 
function. Then, for y large enough, 

(3.7) S'(ay)y2(a - 1)^ L-i.{i2^-i)y) + g-z^fe)] < i. 

Proof. Using H3.5|l and 1)3.6(1 we can rewrite the expression on the left hand side of 
(13.71) as 4:Ia{y)e~^''^'^\ Consider the function Awe~^] it is clear that for w large 
enough (in particular w > 2.16 would sufhce), Awe^^ < 1. 

Observe that 4:Ia{t) — S(t) — ^(ay), which is zero only at t = ay. This 



dt 

unique critical point is a minimum since A 



la ~ ^'{cty) > 0, because 

t=ay 



dt 

S(-) is a concave non-decreasing function ^U]. Therefore, mitla{t) = Ia{cty) 
ay(a— l)S(Q!y), which is monotonically increasing with y. Thus lim^^oo infi Ia{t) < 
limy^oo Ia{y) = oo. 

Putting the last two statements together we see that, indeed, for large enough 
y, the left hand side of 1)3. 7|l is strictly less than 1. □ 

Lemma 13.11 immediately implies that, even when (|3.3ll is applicable, it is less 
tight than (|2.4() far enough in the tail. 

4. A Nonlinear Optimization Problem 

We are now in a position to compute the lower bound presented in an implicit 
way in (|2.4|l . In order to arrive at an explicit computation, we need to solve 
the optimization over the two parameters, a and S, that determine the tightest 
achievable lower bound. 

The first step in our computation is the reduction of the optimization in (|2.4() 
to one variable. In what follows we will use the following symbols to simplify the 
presentation: 

A{a,y) = exp{-/„(y)} 
B{a,S,y) = exp{-Ia{Sy)} 
We proceed by evaluating the first order condition with respect to a: 

^{a,S,y) = -Sy^E'{ay)L + ay'E'{ay)L^^^^^^{i6-a)B-{a-l)A}^Q 

(1 - a)A + {S-a)B _ 
l-A-B 

(4.1) 6*{a,y)='^ 
The following lemma describes the properties of the resulting optimum choice of 5: 
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Lemma 4.1. For every y, S* is a quasiconvex function of a which attains a unique 
minimum at some a{y) G (1, oo). 

Proof. Observe that 

ray 

-y(a - l)E(ay) + / E{t)dt 



lim A — exp < lim 

Q^l+ a~>l + 



and 



(4.2) A^ = ^ = -Ay\a-l)E'{ay). 

Also, limc,_>oo A = 0'^ because the concavity of S jID) impUes lim^^oo 2' (a?/) < oo 
and therefore lima^cxj ^logA = — oo, which imphes that log A, and therefore A 
approach from above as a tends to infinity. The proof of the lemma will take 
three steps. The first step is to show that, for any y, 

1 — A 1 

(4.3) lim S* = lim — ^ = lim —^5- , — 7 = +00. 

Q^i+ a^i+ -Aa a^i+ Ay ^ [a - 1)^' [ay) 

The second step is to observe that lima^oo <5* = +00. The final step of the proof 
involves 

dS^^ l-A-Ay^a-l)^E'{ay) 
^ ■ ^ da (1-^)' 

From H4.3|l we conclude that limfj^2+ ~ —00. Therefore, for every y, there must 

exist a d(?;) G (l,oo) such that ^ {a{y),y) = OandVa £ (a(y),oo) , ^ («(?/), y) > 
0. Indeed, differentiating 14. 4|) with respect to a we see that 

dH* Ay^ia - 1) _ if^'^^yf + s'(ay) - y{a - l)S"(ay)] + 



(4.5) 



9a2 (1 - A)2 

2Ay'^{a -!)■"' [ay] 85* 
^ I- A 'da' 



The concavity of S together with H4.5|l show us that = > 0. Implying 

that each critical point of 5* is a minimum, and therefore there is a unique minimum 
and 8* is quasiconvex. □ 

Using the resulting expression for 5* as a function of a, we can rewrite the lower 
bound L and the expression B above as functions solely of a and y: 

L{a,y) = L {a,S* {a,y),y) 

B{a,y) = exp{-/Q {y6*{a,y))} 
Using this terminology, we observe that: 

Lemma 4.2. For every y there exists a unique d{y) G {l,a{y) such that A{a{y),y)+ 
B(a, y) = 1 and, for all a G (1, a), A{a, y) + B{a, y) < 1. 

Proof. Notice that 

(4.6) B^^^=By\y{5* -a) E'{ay) ~ ^ [E{6*y) - E{ay)] 

oa { oa 

By the concavity of S and Lemma [4. II we can see that Ba > and therefore 

(4.7) lim B >0. 
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Using H4.2|l and H4.6|l . after some algebra we arrive at 



(4^8) |-(.4 



+ B)^y^{ay){5 - \ ^ + B - I - —^^^—^^^^ 



Since the term outside the brackets on the right hand side of (|4.8I) is always non- 
negative, we conclude, using Lemma l4.ll that the following statements are true: 



(1) 3aG(l,d) £\^{A + B) = Q=^, 

(2) 3ae(d,oo) d_\^{A + B)=Q = 

(3) 3a* e (1, a)A{a*) + B{a*) = 1 = 

(4) 3a* e (d, oo)A{a*) + B{a*) = 1 
Also, using H4.3|l . we see that 



A{a) + B{a) < 1. 
» A(d) +B(d) > 1. 
^ £L iA + B)>0. 

(A + B) <0. 



da I 



hm B = 



cxp 



lim 5 

S — >oo 



y^iv) ( 1 - ^ 



Cmdt 



exp \ hm 5y [^(y) 

o— >oo 



S(<5y)] = 



whether lim,. 



, 5 < oo or lim,. 



= oo. This deduction rules out a maximum 



of A -\- B before d, because by statement (1) above, any critical point oi A + B 
before d leads to A + _B < 1 and therefore must be a minimum. Furthermore, the 
combination of statements (2) and (4) above imply that, \i A + B lacks a zero below 
d, then it cannot have a minimum below d, because when A + B < 1 for all a < d, 
the slope oi A + B will be negative for all a < d. Therefore, one of the following 
two statements must hold: 

(i) Either A + i? > 1 for all y and a > 1, or 

(ii) There exists a zero of A + B, d, in (1, d) such that (A + S) > 0. 

It turns out that we can rule out case (i). Specifically, for every a and y, let 
5{a, y) > a be such that A(a, y) + B (a, (5(a, y), y) = 1. Clearly, 

dB 
1)5 

B < 1, then for all a G (1, d) and every y. 



-By{S(5y)-S(ay)} <0. 



Thus, if there is no a G (1, d) with A 
(5*(a,y) < S{a,y), which implies that 

Ly^E'iay) 



L 

J^a — ^ 

oa 



l-A-B 



[(a -A) -5(1 -A)] <0. 



But this would imply that the maximum lower bound L is achieved on 5, which 
leads to maxL — 0. This clearly contradicts the statement of Theorem 12 . II which 
asserts that, for any y, there exists a pair of values for a and 5 making L strictly 
positive. Thus, we are left with statement (ii) as the only viable possibility, which 
establishes the desired result. □ 

Let G(a, S, y) = i?(a, 5, y)E{Sy) - (1 - A{a, y)) S(ay). 

Lemma 4.3. For any a > 1 and y, there exists a unique 5{a, y) G (a, oo) such 
that G (^a,S{a,y),y^ —0. Moreover, ^ {a,5{a,y),yj < 0. 
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Proof. From the definition of G we see that 
dG 



and 

Gss 



Gs = ^ = -By{EiSy) [E{Sy) - E{ay)\ - S'(<5y)} 



-Q^^-y HSy) - S(ay)] ^ - By' {E'iSy) mSy) - E{ay)] - E"{Sy)} . 



which impHes that Gs >0 => Gss < 0. Observe that Gs{a, a, y) — ByE'{ay) > 0. 
Also, using (|4.7|l we can see that for large enough S, Gs{a,5,y) < 0, because the 
concavity of S forces S' to remain uniformly bounded. Thus, for any a > 1 and y, 
G{a, •, y) has a unique maximum d{a, y) G (a, oo). Notice that G(a, a, y) = AE{ay) 
because B[a, a, y) = 1. Also, for any a > 1 and y, 



lim 5(a, (5, y) = exp < lim 8 

5—*oo I (5— »^oo 



= exp < lim Sy [E.{ay) - ^{Sy)] \ = 0+ 

whether limj^^oo S < oo or limj,^oo S ~ oo. Therefore, G indeed possesses a zero, 
S{a,y). Naturally, S > S, and therefore ^ (^a,6{a,y),y^ < 0. Finally, this zero is 
unique because for there to be another, there first must exist a minimum, which is 
prohibited by the preceding. □ 



We are in a position to prove the main theorem of this section: 

Theorem 4.4. For every y there exists a a*(i/) G (1, a{y)) which attains the unique 
maximum of L. 

Proof. We have already seen that, for any a > 1 and any y, La {a, S*{a, y),y) = 0. 
We can also see that 

aL _ LGy 

^ ^ 86 ^ I- A- B 

and thus for any a > 1 and any j/, Ls {c^^ 5{a, y),y^ — 0. The discussion at the end 
of the proof of Lemma f4.2l guarantees, for any y, the existence of an intersection, 
a* S (l,d(y)) between the two curves, S*{-,y) and S{-,y). The only question that 
remains is the uniqueness of this intersection and consequently of the maximum for 
L. Observe that, for any ?/, 



Gl^a,Sj=0=^B (^a, 6j E [ydj = (1 - A)E{ay) 
dA 

-- -g^E{ay)=y{l-A)E'{ay) 



d6_ 

da 



y'^^{ay)^'{ay) 


A{a - 1)(1 - A){de2ta - a) 


+ yE\ay){l-A) 


B (a, (j) 


E (y6^ + yE' (yd) 






where the second hne arises from differentiating both sides of the first hne. At the 
intersection a* of 5* and 5 we can simphfy H4.9|l and obtain 



y— 



'B.(ay) 



~.'[yS(a.,y)) 
y S{ySia,y)) 



> 0. 



On the other hand, since a* < a < a, we know that < 0. Therefore, 

' ' da ' 

„ f dS* 85 
oa oa 

and therefore the intersection of S* and S must be unique. □ 

Figure ^ shows the lower bound resulting from the procedure described in this 
paper, compared to the alternatives discussed above, the Chernoff (upper) bound 
and the exact tail, which can be readily computed in this case. This example 
was chosen in the range where the alternative lower bounds are also applicable, to 
allow for a comparison. As we saw above, the new lower bound does not have the 
limitations in its range of applicability that plague the alternative lower bounds. 
We can in see in Figure ^ that the new lower bound maintains a consistent gap 
from Chernoff bound and exact tail, unlike the alternative lower bounds. 



5. Conclusions and Future Steps 

We have shown a way to construct a lower bound to complement the Chernoff 
bound that is applicable generically, without restrictions to the moment generating 
function that characterized earlier inequalities of a similar nature. We were able 
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to represent the bound as the solution of a two-dimensional nonlinear optimization 
problem, and proved the existence and uniqueness of the solution. 

As we saw earlier, the new lower bound has two technical advantages, aside from 
its broad applicability. Specifically, unlike the saddlepoint approximation, it can be 
formulated without the need for any integrals. Alternatively, it can be formulated 
using only S(-), which makes it easy to combine across iid sequences. From a 
theoretical perspective, the main advantage of the new lower bound is that it does 
not depend on an appeal to the law of large numbers, which often does not hold in 
situations of interest 

At this point it is natural to investigate the asymptotic properties (in the spirit of 
0]) of the new lower bound, including its asymptotic gap from the Chernoff bound. 
It is reasonable to expect that one can classify moment generating functions relative 
to the resulting asymptotic gaps. While there are obvious examples with no gap 
(e.g. Gaussian) and some with a gap a complete classification is still out of 
reach. Extensions to the multivariate case are another natural next step, along the 
lines of OEI for the saddlepoint approximation. 

Furthermore, one may inquire whether the new lower bound, as well as the Cher- 
noff bound, can be extended in situations where the moment generating function 
is not precisely known ^Ij or does not exist at all. The former case will be dealt 
with in a follow-up paper. More generally, the fact that, as shown in this paper, 
bilateral tail bounds are achievable with reference only to S(-) lends support to the 
latter possibility. It is also plausible to substitute the exponential function for other 
ones, more appropriate for different distributions. An investigation of this question 
remains open at the moment. 
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